Automatically Selecting Domain Markers for Terminology Extraction
نویسندگان
چکیده
Some approaches to automatic terminology extraction from corpora imply the use of existing semantic resources for guiding the detection of terms. Most of these systems exploit specialised resources, like UMLS in the medical domain, while a few try to take profit from general-purpose semantic resources, like EuroWordNet (EWN). As the term extraction task is clearly domain depending, in the case a general-purpose resource without specific domain information is used, we need a way of attaching domain information to the units of the resource. For big resources it is desirable that this semantic enrichment could be carried out automatically. Given a specific domain, our proposal aims to detect in EWN those units that can be considered as domain markers (DM). We can define a DM as an EWN entry whose attached strings belong to the domain, as well as the variants of all its descendents through the hyponymy relation. The procedure we propose in this paper is fully automatic and, a priori, domain-independent. The only external knowledge it uses is a set of terms, which is an external vocabulary, which is considered to have at least one sense belonging to the
منابع مشابه
A Way to Automatically Enrich Biomedical Ontologies
Biomedical ontologies play an important role for information extraction in the biomedical domain. We present a workflow for updating automatically biomedical ontologies, composed of four steps. We detail two contributions concerning the concept extraction and semantic linkage of extracted terminology.
متن کاملLearning IE patterns: a terminology extraction perspective
The large-scale applicability of knowledge-based information access systems such as the ones based on Information Extraction techniques strongly depends on the possibility of automatically acquiring the large amount of knowledge required. However, the basic assumption of the IE paradigm, i.e. that the information need is known in advance, limits inherently its applicability since the resulting ...
متن کاملA Free Terminology Extraction Suite
In this paper we will present a set of terminology extraction tools that are distributed under a Free Software License, so that users can freely download, use, distribute and modify them to meet their needs. The tools are mainly programmed in Perl and they will work under different platforms, such as Windows or Linux. These terminology extraction tools will help freelance translators, translati...
متن کاملKeyword Extraction using Term-Domain Interdependence for Dictation of Radio News
In this paper, we propose keyword extraction method for dictation of radio news which consists of several domains. In our method, newspaper articles which are automatically classified into suitable domains are used in order to calculate feature vectors. The feature vectors shows term-domain interdependence and are used for selecting a suitable domain of each part of radio news.
متن کاملKeyword extraction of radio news using domain identification based on categories of an encyclopedia
In this paper, we propose a keyword extraction method for dictation of radio news which consists of several domains. In our method, newspaper articles which are automatically classi ed into suitable domains are used in order to calculate feature vectors. The feature vectors show term-domain interdependence and are used for selecting a suitable domain of each part of radio news.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004